OLAP or On Line Analytical Processing functions are knows as “Ordered” analytical functions in Teradata environment. These functions provides support for many decision support operations in data mining environments that require an ordered set of results set or depend on values from a previous row. These functions are categorized as follows:
Advantages of Analytical Functions: Why someone would like to use analytical functions without making any sense of it, let’s discuss the reasons.
- Window aggregate functions
- Rank function
- Distribution function
- Row number function
- Teradata SQL-specific functions
Advantages of Analytical Functions: Why someone would like to use analytical functions without making any sense of it, let’s discuss the reasons.
- Faster Processing: – As these functions are database specific and processed by query execution engine, processing takes lesser time than in case of using external sorting mechanism.
- Elimination of the need to use external tools which might need exporting large data sets to another environment. Thus coding effort is less too.
SUM/CSUM :– These functions are used to compute a cumulative sum of a particular group of rows. SUM also can be used to simply calculate group sum. For moving sum use MSUM.
COUNT : – To calculate cumulative or moving count.
AVG : – Similarly to compute the moving average use the AVG or MAVG function.
MDIFF : – To see the difference between the current row (column) and the preceding nth row (column) value. If you want to see the sales numbers (increasing or decreasing) on a daily basis, use this function.
MLINREG : – To project the next value in a series based on the data pattern present in the series.
QUANTILE : – To divide the result set into partitions with equal number of rows present in each partition.
RANK : – This function is used to display the ordered rank of all rows in a particular group.
PERCENT_RANK:- To find out relative rank of a row in a group use PERCENT_RANK.
ROW_NUMBER:- To get the sequential row number of the row within its data subset.
MAX/MIN: – To calculate the maximum or minimum cumulative value in a group.
RANK ():- As the function name implies, RANK returns ranking (ordered) of rows based on the number or expression given in the ORDER BY clause. Note that we do not need to give anything inside the RANK function. Just give it to the ORDER BY clause and things would be taken care. Let’s work out a scenario.
Scenario is to find out the sales figure for each store based on their actual sales.
SELECT
store_no
,sale_month
,actual_sale,
RANK () OVER (PARTITION BY store_no ORDER BY actual_sale) as SaleRank
FROM sales;
Note that there is nothing being fed to the function RANK. Result set below shows the sales figure ranked, for each store.
store_no sale_month actual_sale SaleRank
10 05/01/2007 13,500 1
10 01/01/2007 13,500 1
20 02/01/2007 11,000 1
20 04/01/2007 12,500 2
30 02/01/2007 10,000 1
30 03/01/2007 10,500 2
30 03/01/2007 12,500 3
40 02/01/2007 11,500 1
40 04/01/2007 12,500 2
50 01/01/2007 10,500 1
50 05/01/2007 12,500 2
60 06/01/2007 11,500 1
70 07/01/2007 11,500 1
80 07/01/2007 15,500 1
90 06/01/2007 15,500 1
ROW_NUMBER ():- It returns the sequential number of the row within a group, starting with 1 based on the ORDER BY clause. With the above scenario if we replace the RANK with ROW_NUMBER and run the following query:
SELECT
store_no
,sale_month
,actual_sale,
ROW_NUMBER() OVER (PARTITION BY store_no ORDER BY actual_sale) as SaleRank
FROM sales;
The result set we get is similar, only difference is for first two rows we have 1,1 with RANK but 1,2 with ROW_NUMBER. This is because ROW_NUMBER takes into consideration the sequential occurrence of the row in a group which is store_no here.
store_no sale_month actual_sale SaleRank
10 05/01/2007 13,500 1
10 01/01/2007 13,500 2
20 02/01/2007 11,000 1
20 04/01/2007 12,500 2
30 02/01/2007 10,000 1
30 03/01/2007 10,500 2
30 03/01/2007 12,500 3
40 02/01/2007 11,500 1
40 04/01/2007 12,500 2
50 01/01/2007 10,500 1
50 05/01/2007 12,500 2
60 06/01/2007 11,500 1
70 07/01/2007 11,500 1
80 07/01/2007 15,500 1
90 06/01/2007 15,500 1
COUNT ():- As you know when you use this function as aggregate function, it would just return the total number of rows present in the SELECT statement. But the same function when used with OLAP, returns the cumulative or moving count for an expression, let’s see how this is done. Let’s also use the same data set present in the initial article. Scenario is to find out, number of stores selling a particular product.
This is called grouping count and the same can be done in two ways one with taking store_no as count, partitioning by prod_code and the other ways to reverse the logic of taking prod_code as count and partition by store_no.
The following queries would give the desired result.
SELECT store_no, sale_month, actual_sale, prod_code,
COUNT (prod_code) OVER (PARTITION BY store_noROWS BETWEEN UNBOUNDED PRECEDING ANDUNBOUNDED FOLLOWING)
FROM sales;
SELECT store_no, sale_month, actual_sale, prod_code,
COUNT (store_no) OVER (PARTITION BY prod_codeROWS BETWEEN UNBOUNDED PRECEDING ANDUNBOUNDED FOLLOWING)
FROM sales;
store_no sale_month actual_sale prod_code Group Count(store_no)
10 01/01/2007 13,500 Apple 2
60 06/01/2007 11,500 Apple 2
10 05/01/2007 13,500 Apricot 2
40 04/01/2007 12,500 Apricot 2
70 07/01/2007 11,500 Banana 2
20 02/01/2007 11,000 Banana 2
50 05/01/2007 12,500 Grapes 2
80 07/01/2007 15,500 Grapes 2
50 01/01/2007 10,500 Guava 2
40 02/01/2007 11,500 Guava 2
30 03/01/2007 12,500 Mango 2
20 04/01/2007 12,500 Mango 2
30 03/01/2007 10,500 Orange 2
90 06/01/2007 15,500 Orange 2
30 02/01/2007 10,000 Strawberry 1
New Terms:-
ROWS BETWEEN: – This clause is to specify the start and end of the aggregation group.The default aggregation group is ROWS BETWEEN UNBOUNDED PRECEDING ANDUNBOUNDED FOLLOWING (take out this portion from the above query and run).
UNBOUNDED PROCEEDING: – When this is specified, the entire partition that precedes the current row is taken into the aggregation group.
UNBOUNDED FOLLOWING: – Used to define the entire partition that follows the current row.
SUM ():- When used with partition clause, SUM () function returns the cumulative or moving sum of an expression based on how the aggregation group is specified. This function can also be used in place of CSUM and MSUM which are Teradata-specific functions and are discouraged to a great extent. As per recent recommendations from Teradata, usage of ANSI-compliant window function for any new applications is advised. Lets see why Teradata suggests to go for ANSI-compliant SUM () function.
• If you are using SUM function and want to calculate the cumulative SUM, then just specify ORDER BY clause with ROWS UNBOUNDED PRECEDING which would give the same result as that of CSUM function.
• To compute moving average using SUM window function use ORDER BY clause and specify ROWS number PRECEDING (number of rows preceding the current row).
Let’s work out some examples using the data present in the previous section.
Scenario 1:- Calculate the cumulative actual sales per store ordered by sale month:
SELECT store_no, sale_month, actual_sale,
SUM (actual_sale) OVER (PARTITION BY store_no ORDER BY sale_month ROWS UNBOUNDED PRECEDING) as StoreTotal
FROM sales;ORDER BY store_no, sale_month;
Result Set:-
store_no sale_month actual_sale StoreTotal
10 01/01/2007 13,500 13,500
10 05/01/2007 13,500 27,000
20 02/01/2007 11,000 11,000
20 04/01/2007 12,500 23,500
30 02/01/2007 10,000 10,000
30 03/01/2007 12,500 22,500
30 03/01/2007 10,500 33,000
40 02/01/2007 11,500 11,500
40 04/01/2007 12,500 24,000
50 01/01/2007 10,500 10,500
50 05/01/2007 12,500 23,000
60 06/01/2007 11,500 11,500
70 07/01/2007 11,500 11,500
80 07/01/2007 15,500 15,500
90 06/01/2007 15,500 15,500
Scenario 2:- Calculate the total actual sales of fruits by category per each store:
SELECT store_no, prod_code, actual_sale,
SUM (actual_sale) OVER (PARTITION BY prod_code ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as FruitTotal
FROM sales;
Result would be as follows:
store_no prod_code actual_sale FruitTotal
60 Apple 11,500 25,000
10 Apple 13,500 25,000
40 Apricot 12,500 26,000
10 Apricot 13,500 26,000
20 Banana 11,000 22,500
70 Banana 11,500 22,500
50 Grapes 12,500 28,000
80 Grapes 15,500 28,000
50 Guava 10,500 22,000
40 Guava 11,500 22,000
20 Mango 12,500 25,000
30 Mango 12,500 25,000
90 Orange 15,500 26,000
30 Orange 10,500 26,000
30 Strawberry 10,000 10,000
Scenario 3:- Calculate the moving actual sales per month per store.
SELECT store_no, sale_month, actual_sale,
SUM (actual_sale) OVER (PARTITION BY store_no, sale_month ORDER BY actual_sale ROWS 1 PRECEDING) as SaleTrend
FROM sales;
And the Result would be:
store_no sale_month actual_sale SaleTrend
10 01/01/2007 13,500 13,500
10 05/01/2007 13,500 13,500
20 02/01/2007 11,000 11,000
20 04/01/2007 12,500 12,500
30 02/01/2007 10,000 10,000
30 03/01/2007 10,500 10,500
30 03/01/2007 12,500 23,000
40 02/01/2007 11,500 11,500
40 04/01/2007 12,500 12,500
50 01/01/2007 10,500 10,500
50 05/01/2007 12,500 12,500
60 06/01/2007 11,500 11,500
70 07/01/2007 11,500 11,500
80 07/01/2007 15,500 15,500
90 06/01/2007 15,500 15,500
Let’s work out some examples of OLAP functions that are being discussed in the previous articles. Before that make sure you have a table and there is data present init. Follow the below instruction to create a temporary table and insert data into it. While inserting data you might encounter one of the following errors:
3520: A constant value in a query is not valid for column sale_month.
3535: A character string failed conversion to a numeric value.
2665: Invalid date.
All of these errors are due to the date value that you are inserting. If you encounter any of these errors, try permutation and combination of the date value and see if you can get rid of the error. If you still can not then find out from your sys admin what date format in allowed while you are inserting dates. Here in this example the format is ‘yyyy-mm-dd’. All of the below statements are tested in SQL assistant. Create a Volatile Table “Sales” as follows:
CREATE SET VOLATILE TABLE <username>.sales,
NO FALLBACK,
CHECKSUM = DEFAULT,
LOG
(
“store_no” INTEGER,
“sale_month” DATE,
“prod_code” CHAR (20) CHARACTER SET LATIN NOT CASESPECIFIC,
“projected_sale” INTEGER,
“actual_sale” INTEGER
)
PRIMARY INDEX ( “store_no” )
ON COMMIT PRESERVE ROWS;
Insert data as follows:
INSERT INTO sales VALUES(10,’2007-01-01′,’Apple’,15000,13500);
INSERT INTO sales VALUES(20,’2007-02-01′,’Banana’,10000,11000);
INSERT INTO sales VALUES(30,’2007-03-01′,’Orange’,11000,10500);
INSERT INTO sales VALUES(40,’2007-04-01′,’Apricot’,12500,12500);
INSERT INTO sales VALUES(50,’2007-05-01′,’Grapes’,15000,12500);
INSERT INTO sales VALUES(60,’2007-06-01′,’Apple’,12000,11500);
INSERT INTO sales VALUES(70,’2007-07-01′,’Banana’,15500,11500);
INSERT INTO sales VALUES(80,’2007-07-01′,’Grapes’,15000,15500);
INSERT INTO sales VALUES(90,’2007-06-01′,’Orange’,16000,15500);
INSERT INTO sales VALUES(10,’2007-05-01′,’Apricot’,14500,13500);
INSERT INTO sales VALUES(20,’2007-04-01′,’Mango’,15500,12500);
INSERT INTO sales VALUES(30,’2007-03-01′,’Mango’,15000,12500);
INSERT INTO sales VALUES(40,’2007-02-01′,’Guava’,14000,11500);
INSERT INTO sales VALUES(50,’2007-01-01′,’Guava’,12000,10500);
INSERT INTO sales VALUES(30,’2007-02-01′,’Strawberry’,12000,10000);
Now take the example of AVG () function. As you know AVG () computes the cumulative or moving average of a column, we will calculate something similar in the following scenario.
Scenario is to find out the actual sales in each store, averaged over the current month and the preceding month. Here preceding month would be understood as the previous month to the current month, now what is current month, it is the first row returned after the result set is ordered after partitioning. Following is the query.
SELECT store_no,
sale_month,
actual_sale,
projected_sale,
AVG(actual_sale) OVER (PARTITION BY store_no
ORDER BY projected_sale ROWS 1 PRECEDING)
FROM sales;
And it would return the following result set.
store_no sale_month actual_sale projected_sale Moving Avg(actual_sale)
10 5/1/2007 13,500 14,500 13,500.00
10 1/1/2007 13,500 15,000 13,500.00
20 2/1/2007 11,000 10,000 11,000.00
20 4/1/2007 12,500 15,500 11,750.00
30 3/1/2007 10,500 11,000 10,500.00
30 2/1/2007 10,000 12,000 10,250.00
30 3/1/2007 12,500 15,000 11,250.00
40 4/1/2007 12,500 12,500 12,500.00
40 2/1/2007 11,500 14,000 12,000.00
50 1/1/2007 10,500 12,000 10,500.00
50 5/1/2007 12,500 15,000 11,500.00
60 6/1/2007 11,500 12,000 11,500.00
70 7/1/2007 11,500 15,500 11,500.00
80 7/1/2007 15,500 15,000 15,500.00
90 6/1/2007 15,500 16,000 15,500.00
Note that the moving average number is changing every 2nd row in case there is a difference in between the 1st and 2nd row figure within the same store number. Particularly for the store number 30, you can see that there is a slight increase in the moving average in spite of good increase in actual sales. Projected Sales is also added just to have an idea how the actual sale is performing over the project sale.
COUNT : – To calculate cumulative or moving count.
AVG : – Similarly to compute the moving average use the AVG or MAVG function.
MDIFF : – To see the difference between the current row (column) and the preceding nth row (column) value. If you want to see the sales numbers (increasing or decreasing) on a daily basis, use this function.
MLINREG : – To project the next value in a series based on the data pattern present in the series.
QUANTILE : – To divide the result set into partitions with equal number of rows present in each partition.
RANK : – This function is used to display the ordered rank of all rows in a particular group.
PERCENT_RANK:- To find out relative rank of a row in a group use PERCENT_RANK.
ROW_NUMBER:- To get the sequential row number of the row within its data subset.
MAX/MIN: – To calculate the maximum or minimum cumulative value in a group.
RANK ():- As the function name implies, RANK returns ranking (ordered) of rows based on the number or expression given in the ORDER BY clause. Note that we do not need to give anything inside the RANK function. Just give it to the ORDER BY clause and things would be taken care. Let’s work out a scenario.
Scenario is to find out the sales figure for each store based on their actual sales.
SELECT
store_no
,sale_month
,actual_sale,
RANK () OVER (PARTITION BY store_no ORDER BY actual_sale) as SaleRank
FROM sales;
Note that there is nothing being fed to the function RANK. Result set below shows the sales figure ranked, for each store.
store_no sale_month actual_sale SaleRank
10 05/01/2007 13,500 1
10 01/01/2007 13,500 1
20 02/01/2007 11,000 1
20 04/01/2007 12,500 2
30 02/01/2007 10,000 1
30 03/01/2007 10,500 2
30 03/01/2007 12,500 3
40 02/01/2007 11,500 1
40 04/01/2007 12,500 2
50 01/01/2007 10,500 1
50 05/01/2007 12,500 2
60 06/01/2007 11,500 1
70 07/01/2007 11,500 1
80 07/01/2007 15,500 1
90 06/01/2007 15,500 1
ROW_NUMBER ():- It returns the sequential number of the row within a group, starting with 1 based on the ORDER BY clause. With the above scenario if we replace the RANK with ROW_NUMBER and run the following query:
SELECT
store_no
,sale_month
,actual_sale,
ROW_NUMBER() OVER (PARTITION BY store_no ORDER BY actual_sale) as SaleRank
FROM sales;
The result set we get is similar, only difference is for first two rows we have 1,1 with RANK but 1,2 with ROW_NUMBER. This is because ROW_NUMBER takes into consideration the sequential occurrence of the row in a group which is store_no here.
store_no sale_month actual_sale SaleRank
10 05/01/2007 13,500 1
10 01/01/2007 13,500 2
20 02/01/2007 11,000 1
20 04/01/2007 12,500 2
30 02/01/2007 10,000 1
30 03/01/2007 10,500 2
30 03/01/2007 12,500 3
40 02/01/2007 11,500 1
40 04/01/2007 12,500 2
50 01/01/2007 10,500 1
50 05/01/2007 12,500 2
60 06/01/2007 11,500 1
70 07/01/2007 11,500 1
80 07/01/2007 15,500 1
90 06/01/2007 15,500 1
COUNT ():- As you know when you use this function as aggregate function, it would just return the total number of rows present in the SELECT statement. But the same function when used with OLAP, returns the cumulative or moving count for an expression, let’s see how this is done. Let’s also use the same data set present in the initial article. Scenario is to find out, number of stores selling a particular product.
This is called grouping count and the same can be done in two ways one with taking store_no as count, partitioning by prod_code and the other ways to reverse the logic of taking prod_code as count and partition by store_no.
The following queries would give the desired result.
SELECT store_no, sale_month, actual_sale, prod_code,
COUNT (prod_code) OVER (PARTITION BY store_noROWS BETWEEN UNBOUNDED PRECEDING ANDUNBOUNDED FOLLOWING)
FROM sales;
SELECT store_no, sale_month, actual_sale, prod_code,
COUNT (store_no) OVER (PARTITION BY prod_codeROWS BETWEEN UNBOUNDED PRECEDING ANDUNBOUNDED FOLLOWING)
FROM sales;
store_no sale_month actual_sale prod_code Group Count(store_no)
10 01/01/2007 13,500 Apple 2
60 06/01/2007 11,500 Apple 2
10 05/01/2007 13,500 Apricot 2
40 04/01/2007 12,500 Apricot 2
70 07/01/2007 11,500 Banana 2
20 02/01/2007 11,000 Banana 2
50 05/01/2007 12,500 Grapes 2
80 07/01/2007 15,500 Grapes 2
50 01/01/2007 10,500 Guava 2
40 02/01/2007 11,500 Guava 2
30 03/01/2007 12,500 Mango 2
20 04/01/2007 12,500 Mango 2
30 03/01/2007 10,500 Orange 2
90 06/01/2007 15,500 Orange 2
30 02/01/2007 10,000 Strawberry 1
New Terms:-
ROWS BETWEEN: – This clause is to specify the start and end of the aggregation group.The default aggregation group is ROWS BETWEEN UNBOUNDED PRECEDING ANDUNBOUNDED FOLLOWING (take out this portion from the above query and run).
UNBOUNDED PROCEEDING: – When this is specified, the entire partition that precedes the current row is taken into the aggregation group.
UNBOUNDED FOLLOWING: – Used to define the entire partition that follows the current row.
SUM ():- When used with partition clause, SUM () function returns the cumulative or moving sum of an expression based on how the aggregation group is specified. This function can also be used in place of CSUM and MSUM which are Teradata-specific functions and are discouraged to a great extent. As per recent recommendations from Teradata, usage of ANSI-compliant window function for any new applications is advised. Lets see why Teradata suggests to go for ANSI-compliant SUM () function.
• If you are using SUM function and want to calculate the cumulative SUM, then just specify ORDER BY clause with ROWS UNBOUNDED PRECEDING which would give the same result as that of CSUM function.
• To compute moving average using SUM window function use ORDER BY clause and specify ROWS number PRECEDING (number of rows preceding the current row).
Let’s work out some examples using the data present in the previous section.
Scenario 1:- Calculate the cumulative actual sales per store ordered by sale month:
SELECT store_no, sale_month, actual_sale,
SUM (actual_sale) OVER (PARTITION BY store_no ORDER BY sale_month ROWS UNBOUNDED PRECEDING) as StoreTotal
FROM sales;ORDER BY store_no, sale_month;
Result Set:-
store_no sale_month actual_sale StoreTotal
10 01/01/2007 13,500 13,500
10 05/01/2007 13,500 27,000
20 02/01/2007 11,000 11,000
20 04/01/2007 12,500 23,500
30 02/01/2007 10,000 10,000
30 03/01/2007 12,500 22,500
30 03/01/2007 10,500 33,000
40 02/01/2007 11,500 11,500
40 04/01/2007 12,500 24,000
50 01/01/2007 10,500 10,500
50 05/01/2007 12,500 23,000
60 06/01/2007 11,500 11,500
70 07/01/2007 11,500 11,500
80 07/01/2007 15,500 15,500
90 06/01/2007 15,500 15,500
Scenario 2:- Calculate the total actual sales of fruits by category per each store:
SELECT store_no, prod_code, actual_sale,
SUM (actual_sale) OVER (PARTITION BY prod_code ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as FruitTotal
FROM sales;
Result would be as follows:
store_no prod_code actual_sale FruitTotal
60 Apple 11,500 25,000
10 Apple 13,500 25,000
40 Apricot 12,500 26,000
10 Apricot 13,500 26,000
20 Banana 11,000 22,500
70 Banana 11,500 22,500
50 Grapes 12,500 28,000
80 Grapes 15,500 28,000
50 Guava 10,500 22,000
40 Guava 11,500 22,000
20 Mango 12,500 25,000
30 Mango 12,500 25,000
90 Orange 15,500 26,000
30 Orange 10,500 26,000
30 Strawberry 10,000 10,000
Scenario 3:- Calculate the moving actual sales per month per store.
SELECT store_no, sale_month, actual_sale,
SUM (actual_sale) OVER (PARTITION BY store_no, sale_month ORDER BY actual_sale ROWS 1 PRECEDING) as SaleTrend
FROM sales;
And the Result would be:
store_no sale_month actual_sale SaleTrend
10 01/01/2007 13,500 13,500
10 05/01/2007 13,500 13,500
20 02/01/2007 11,000 11,000
20 04/01/2007 12,500 12,500
30 02/01/2007 10,000 10,000
30 03/01/2007 10,500 10,500
30 03/01/2007 12,500 23,000
40 02/01/2007 11,500 11,500
40 04/01/2007 12,500 12,500
50 01/01/2007 10,500 10,500
50 05/01/2007 12,500 12,500
60 06/01/2007 11,500 11,500
70 07/01/2007 11,500 11,500
80 07/01/2007 15,500 15,500
90 06/01/2007 15,500 15,500
Let’s work out some examples of OLAP functions that are being discussed in the previous articles. Before that make sure you have a table and there is data present init. Follow the below instruction to create a temporary table and insert data into it. While inserting data you might encounter one of the following errors:
3520: A constant value in a query is not valid for column sale_month.
3535: A character string failed conversion to a numeric value.
2665: Invalid date.
All of these errors are due to the date value that you are inserting. If you encounter any of these errors, try permutation and combination of the date value and see if you can get rid of the error. If you still can not then find out from your sys admin what date format in allowed while you are inserting dates. Here in this example the format is ‘yyyy-mm-dd’. All of the below statements are tested in SQL assistant. Create a Volatile Table “Sales” as follows:
CREATE SET VOLATILE TABLE <username>.sales,
NO FALLBACK,
CHECKSUM = DEFAULT,
LOG
(
“store_no” INTEGER,
“sale_month” DATE,
“prod_code” CHAR (20) CHARACTER SET LATIN NOT CASESPECIFIC,
“projected_sale” INTEGER,
“actual_sale” INTEGER
)
PRIMARY INDEX ( “store_no” )
ON COMMIT PRESERVE ROWS;
Insert data as follows:
INSERT INTO sales VALUES(10,’2007-01-01′,’Apple’,15000,13500);
INSERT INTO sales VALUES(20,’2007-02-01′,’Banana’,10000,11000);
INSERT INTO sales VALUES(30,’2007-03-01′,’Orange’,11000,10500);
INSERT INTO sales VALUES(40,’2007-04-01′,’Apricot’,12500,12500);
INSERT INTO sales VALUES(50,’2007-05-01′,’Grapes’,15000,12500);
INSERT INTO sales VALUES(60,’2007-06-01′,’Apple’,12000,11500);
INSERT INTO sales VALUES(70,’2007-07-01′,’Banana’,15500,11500);
INSERT INTO sales VALUES(80,’2007-07-01′,’Grapes’,15000,15500);
INSERT INTO sales VALUES(90,’2007-06-01′,’Orange’,16000,15500);
INSERT INTO sales VALUES(10,’2007-05-01′,’Apricot’,14500,13500);
INSERT INTO sales VALUES(20,’2007-04-01′,’Mango’,15500,12500);
INSERT INTO sales VALUES(30,’2007-03-01′,’Mango’,15000,12500);
INSERT INTO sales VALUES(40,’2007-02-01′,’Guava’,14000,11500);
INSERT INTO sales VALUES(50,’2007-01-01′,’Guava’,12000,10500);
INSERT INTO sales VALUES(30,’2007-02-01′,’Strawberry’,12000,10000);
Now take the example of AVG () function. As you know AVG () computes the cumulative or moving average of a column, we will calculate something similar in the following scenario.
Scenario is to find out the actual sales in each store, averaged over the current month and the preceding month. Here preceding month would be understood as the previous month to the current month, now what is current month, it is the first row returned after the result set is ordered after partitioning. Following is the query.
SELECT store_no,
sale_month,
actual_sale,
projected_sale,
AVG(actual_sale) OVER (PARTITION BY store_no
ORDER BY projected_sale ROWS 1 PRECEDING)
FROM sales;
And it would return the following result set.
store_no sale_month actual_sale projected_sale Moving Avg(actual_sale)
10 5/1/2007 13,500 14,500 13,500.00
10 1/1/2007 13,500 15,000 13,500.00
20 2/1/2007 11,000 10,000 11,000.00
20 4/1/2007 12,500 15,500 11,750.00
30 3/1/2007 10,500 11,000 10,500.00
30 2/1/2007 10,000 12,000 10,250.00
30 3/1/2007 12,500 15,000 11,250.00
40 4/1/2007 12,500 12,500 12,500.00
40 2/1/2007 11,500 14,000 12,000.00
50 1/1/2007 10,500 12,000 10,500.00
50 5/1/2007 12,500 15,000 11,500.00
60 6/1/2007 11,500 12,000 11,500.00
70 7/1/2007 11,500 15,500 11,500.00
80 7/1/2007 15,500 15,000 15,500.00
90 6/1/2007 15,500 16,000 15,500.00
Note that the moving average number is changing every 2nd row in case there is a difference in between the 1st and 2nd row figure within the same store number. Particularly for the store number 30, you can see that there is a slight increase in the moving average in spite of good increase in actual sales. Projected Sales is also added just to have an idea how the actual sale is performing over the project sale.
No comments:
Post a Comment